Who's Who? Identifying Concepts and Entities across Multiple Documents
نویسندگان
چکیده
A number of research and software development groups have developed technology for identifying terms and names in documents and associating them with concepts and named entities, but few have addressed coreference of concepts and entities across multiple documents in a collection. Cross-document coreference is challenging, since a collection of documents consists of multiple discourse contexts, with a many-to-many correspondence between terms and names on one hand and the concepts and entities they refer to on the other. In this paper we describe extensions to our intra-document term and name identification for coreferencing concepts and entities across documents.
منابع مشابه
Solving the "Who's Mark Johnson Puzzle": Information Extraction Based Cross Document Coreference
Cross Document Coreference (CDC) is the problem of resolving the underlying identity of entities across multiple documents and is a major step for document understanding. We develop a framework to efficiently determine the identity of a person based on extracted information, which includes unary properties such as gender and title, as well as binary relationships with other named entities such ...
متن کاملIs Hillary Rodham Clinton The President? Disambiguating Names Across Documents
A number of research and software development groups have developed name identification technology, but few have addressed the issue of cross-document coreference, or identifying the same named entities across documents. In a collection of documents, where there are multiple discourse contexts, there exists a manyto-many correspondence between names and entities, making it a challenge to automa...
متن کاملIdentifying Similar and Co-referring Documents Across Languages
This paper presents a methodology for finding similarity and co-reference of documents across languages. The similarity between the documents is identified according to the content of the whole document and co-referencing of documents is found by taking the named entities present in the document. Here we use Vector Space Model (VSM) for identifying both similarity and co-reference. This can be ...
متن کاملConceptual and institutional gaps: understanding how the WHO can become a more effective cross-sectoral collaborator
BACKGROUND Two themes consistently emerge from the broad range of academics, policymakers and opinion leaders who have proposed changes to the World Health Organization (WHO): that reform efforts are too slow, and that they do too little to strengthen WHO's capacity to facilitate cross-sectoral collaboration. This study seeks to identify possible explanations for the challenges WHO faces in add...
متن کاملA Bibliometric Analysis of Open Strategy: A new Concept in Strategic Management
Strategy development has traditionally been an exclusive and secretive matter. However, some organizations have recently used IT to enable openness for making a strategy. The aim of this paper was to research the trends of open strategy by applying bibliometric mapping. The method involves identifying open strategy-related documents, including a sample of 1717 existing documents from 2000 to 20...
متن کامل